skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Liu, T"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Spatial transcriptomics (ST) has emerged as a powerful technology for bridging histology imaging with gene expression profiling. However, its application has been limited by low throughput and the need for specialized experimental facilities. Prior works sought to predict ST from whole-slide histology images to accelerate this process, but they suffer from two major limitations. First, they do not explicitly model cell-cell interaction as they factorize the joint distribution of whole-slide ST data and predict the gene expression of each spot independently. Second, their encoders struggle with memory constraints due to the large number of spots (often exceeding 10,000) in typical ST datasets. Herein, we propose STFlow, a flow matching generative model that considers cell-cell interaction by modeling the joint distribution of gene expression of an entire slide. It also employs an efficient slide-level encoder with local spatial attention, enabling whole-slide processing without excessive memory overhead. On the recently curated HEST-1k and STImage-1K4M benchmarks, STFlow substantially outperforms state-of-the-art baselines and achieves over 18% relative improvements over the pathology foundation models. 
    more » « less
    Free, publicly-accessible full text available June 18, 2026
  2. Large language models (LLMs) have achieved impressive performance but face high computational costs and latency, limiting their deployment in resource-constrained settings. In contrast, small-scale LLMs (SLMs) are more efficient yet struggle to capture evolving real-world knowledge. Retrieval-augmented generation (RAG) helps by integrating external knowledge, but imperfect retrieval can introduce distracting noise that misleads SLMs. We propose {\name}, a robust RAG framework for SLMs via Margin-aware Preference Optimization. {\name} employs multi-turn prompting for detailed reasoning, rejection sampling for high-quality explanations, and contrastive preference selection to refine responses by maximizing the likelihood gap between preferred and non-preferred outputs. 
    more » « less
    Free, publicly-accessible full text available July 17, 2026
  3. Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing (KE) to update specific knowledge in LLMs without changing unrelated others or compromising their pre-trained capabilities. Previous efforts sought to update a small amount of parameters of a LLM and proved effective for making selective updates. Nonetheless, the edited LLM often exhibits degraded ability to reason about the new knowledge. In this work, we identify a key issue: \textit{heterogeneous token overfitting} (HTO), where the LLM overfits different tokens in the provided knowledge at varying rates. To tackle this, we propose {\NAME}, a token-level smoothing method that mitigates HTO by adaptively refining the target distribution. Theoretically, {\NAME} offers better parameter updates with negligible computation overhead. It also induces an implicit DPO but does not require preference data pairs. Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method. 
    more » « less
    Free, publicly-accessible full text available July 17, 2026
  4. Free, publicly-accessible full text available January 1, 2026
  5. Growing interconnect bandwidth demand in large datacenters requires energy-efficient optical transceivers that operate with four-level pulse amplitude modulation (PAM4) to enable high per-wavelength data rates. Further increases in bandwidth density is possible by leveraging wavelength-division multiplexing (WDM), which optical link architectures based on silicon photonic microring modulators (MRMs) and drop filters inherently enable. This paper presents high-speed PAM4 transmitter and receiver front-ends implemented in a 28nm CMOS process that are co-designed with these silicon photonic optical devices to enable energy-efficient operation. The transmitter utilizes an optical digital-to-analog converter (DAC) approach with two PAM2 AC-coupled pulsed-cascode high-swing voltage-mode output stages to drive the MRM MSB/LSB segments. A 3.42Vppd output swing is achieved when operating at 80Gb/s PAM4 with an energy efficiency of 3.66pJ/bit. The receiver front-end interfaces with a silicon-germanium avalanche photodiode (APD) and utilizes a low-bandwidth input transimpedance amplifier followed by continuous-time linear equalizer and variable-gain amplifier stages. Biasing the APD to realize a gain of 2 allows for -7dBm optical modulation amplitude (OMA) sensitivity at 56Gb/s PAM4 with a BER=10-4 and an energy efficiency of 1.61pJ/bit. Experimental verification of the full PAM4 transceiver at 50Gb/s operation shows -4.66dBm OMA sensitivity at a BER~4x10-4. 
    more » « less
    Free, publicly-accessible full text available April 21, 2026
  6. This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of “offline" data examples. While recent years have seen a flurry of work on applying various machine learning techniques to the offline optimization problem, the majority of these works focused on learning a surrogate of the unknown objective function and then applying existing optimization algorithms. While the idea of modeling the unknown objective function is intuitive and appealing, from the learning point of view it also makes it very difficult to tune the objective of the learner according to the objective of optimization. Instead of learning and then optimizing the unknown objective function, in this paper we take on a less intuitive but more direct view that optimization can be thought of as a process of sampling from a generative model. To learn an effective generative model from the offline data examples, we consider the standard technique of “re-weighting", and our main technical contribution is a probably approximately correct (PAC) lower bound on the natural optimization objective, which allows us to jointly learn a weight function and a score-based generative model from a surrogate loss function. The robustly competitive performance of the proposed approach is demonstrated via empirical studies using the standard offline optimization benchmarks. 
    more » « less